Search CORE

42 research outputs found

Using registries to integrate bioinformatics tools and services into workbench environments

Author: Ison Jon
Kalaš Matúš
Ménager Hervé
Rapacki Kristoffer
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

The diversity and complexity of bioinformatics resources presents significant challenges to their localisation, deployment and use, creating a need for reliable systems that address these issues. Meanwhile, users demand increasingly usable and integrated ways to access and analyse data, especially within convenient, integrated “workbench” environments. Resource descriptions are the core element of registry and workbench systems, which are used to both help the user find and comprehend available software tools, data resources, and Web Services, and to localise, execute and combine them. The descriptions are, however, hard and expensive to create and maintain, because they are volatile and require an exhaustive knowledge of the described resource, its applicability to biological research, and the data model and syntax used to describe it. We present here the Workbench Integration Enabler, a software component that will ease the integration of bioinformatics resources in a workbench environment, using their description provided by the existing ELIXIR Tools and Data Services Registry

University of Bergen

Springer - Publisher Connector

NORA - Norwegian Open Research Archives

Online Research Database In Technology

Tools and data services registry: a community effort to document bioinformatics resources

Author: et al. 69 authors
Ison Jon
Kalaš Matúš
Ménager Hervé
Rapacki Kristoffer
Publication venue: 'Oxford University Press (OUP)'
Publication date: 24/08/2016
Field of study

Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand. Here we present a community-driven curation effort, supported by ELIXIR––the European infrastructure for biological information––that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners. As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools.publishedVersio

University of Bergen

FreeContact: fast and free software for protein contact prediction from residue co-evolution

Author: Hopf Thomas A
Kaján László
Kalaš Matúš
Marks Debora S
Rost Burkhard
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: 20 years of improved technology and growing sequences now renders residue-residue contact constraints in large protein families through correlated mutations accurate enough to drive de novo predictions of protein three-dimensional structure. The method EVfold broke new ground using mean-field Direct Coupling Analysis (EVfold-mfDCA); the method PSICOV applied a related concept by estimating a sparse inverse covariance matrix. Both methods (EVfold-mfDCA and PSICOV) are publicly available, but both require too much CPU time for interactive applications. On top, EVfold-mfDCA depends on proprietary software. Results: Here, we present FreeContact, a fast, open source implementation of EVfold-mfDCA and PSICOV. On a test set of 140 proteins, FreeContact was almost eight times faster than PSICOV without decreasing prediction performance. The EVfold-mfDCA implementation of FreeContact was over 220 times faster than PSICOV with negligible performance decrease. EVfold-mfDCA was unavailable for testing due to its dependency on proprietary software. FreeContact is implemented as the free C++ library “libfreecontact”, complete with command line tool “freecontact”, as well as Perl and Python modules. All components are available as Debian packages. FreeContact supports the BioXSD format for interoperability. Conclusions: FreeContact provides the opportunity to compute reliable contact predictions in any environment (desktop or cloud)

University of Bergen

Crossref

Harvard University - DASH

Springer - Publisher Connector

PubMed Central

NORA - Norwegian Open Research Archives

Galaxy: A Decade of Realising CWFR Concepts

Author: Coppens Frederik
Eguinoa Ignacio
Fouilloux Anne
Grüning Björn
Kalaš Matúš
Serrano-Solano Beatriz
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2022
Field of study

Despite recent encouragement to follow the FAIR principles, the day-to-day research practices have not changed substantially. Due to new developments and the increasing pressure to apply best practices, initiatives to improve the efficiency and reproducibility of scientific workflows are becoming more prevalent. In this article, we discuss the importance of well-annotated tools and the specific requirements to ensure reproducible research with FAIR outputs. We detail how Galaxy, an open-source workflow management system with a web-based interface, has implemented the concepts that are put forward by the Canonical Workflow Framework for Research (CWFR), whilst minimising changes to the practices of scientific communities. Although we showcase concrete applications from two different domains, this approach is generalisable to any domain and particularly useful in interdisciplinary research and science-based applications.publishedVersio

University of Bergen

Identifying elemental genomic track types and representing them uniformly

Author: Abul Osman
Frigessi Arnoldo
Gundersen Sveinung
Hovig Eivind
Kalaš Matúš
Sandve Geir Kjetil
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background With the recent advances and availability of various high-throughput sequencing technologies, data on many molecular aspects, such as gene regulation, chromatin dynamics, and the three-dimensional organization of DNA, are rapidly being generated in an increasing number of laboratories. The variation in biological context, and the increasingly dispersed mode of data generation, imply a need for precise, interoperable and flexible representations of genomic features through formats that are easy to parse. A host of alternative formats are currently available and in use, complicating analysis and tool development. The issue of whether and how the multitude of formats reflects varying underlying characteristics of data has to our knowledge not previously been systematically treated. Results We here identify intrinsic distinctions between genomic features, and argue that the distinctions imply that a certain variation in the representation of features as genomic tracks is warranted. Four core informational properties of tracks are discussed: gaps, lengths, values and interconnections. From this we delineate fifteen generic track types. Based on the track type distinctions, we characterize major existing representational formats and find that the track types are not adequately supported by any single format. We also find, in contrast to the XML formats, that none of the existing tabular formats are conveniently extendable to support all track types. We thus propose two unified formats for track data, an improved XML format, BioXSD 1.1, and a new tabular format, GTrack 1.0. Conclusions The defined track types are shown to capture relevant distinctions between genomic annotation tracks, resulting in varying representational needs and analysis possibilities. The proposed formats, GTrack 1.0 and BioXSD 1.1, cater to the identified track distinctions and emphasize preciseness, flexibility and parsing convenience.</p

University of Bergen

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

NORA - Norwegian Open Research Archives

EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats

Author: Bolser Dan
Ison Jon
Jonassen Inge
Kalaš Matúš
Lopez Rodrigo
Malone James
McWilliam Hamish
Pettifer Steve
Rice Peter
Uludag Mahmut
Publication venue: 'Oxford University Press (OUP)'
Publication date: 17/11/2015
Field of study

Motivation: Advancing the search, publication and integration of bioinformatics tools and resources demands consistent machine-understandable descriptions. A comprehensive ontology allowing such descriptions is therefore required. Results: EDAM is an ontology of bioinformatics operations (tool or workflow functions), types of data and identifiers, application domains and data formats. EDAM supports semantic annotation of diverse entities such as Web services, databases, programmatic libraries, standalone tools, interactive applications, data schemas, datasets and publications within bioinformatics. EDAM applies to organizing and finding suitable tools and data and to automating their integration into complex applications or workflows. It includes over 2200 defined concepts and has successfully been used for annotations and implementations. Availability: The latest stable version of EDAM is available in OWL format from http://edamontology.org/EDAM.owl and in OBO format from http://edamontology.org/EDAM.obo. It can be viewed online at the NCBO BioPortal and the EBI Ontology Lookup Service. For documentation and license please refer to http://edamontology.org. This article describes version 1.2 available at http://edamontology.org/EDAM_1.2.owl.publishedVersio

University of Bergen

The Genomic HyperBrowser: an analysis web server for genome-scale data

Author: Clancy Trevor
Drabløs Finn
Ferkingstad Egil
Frigessi Arnoldo
Glad Ingrid Kristine
Gunathasan Krishanthi
Gundersen Sveinung
Holden Lars
Holden Marit
Hovig Johannes Eivind
Johansen Morten
Kalaš Matúš
Lien Tonje Gulbrandsen
Liestøl Knut
Nygaard Vegard
Nygård Ståle
Paulsen Jonas
Rydbeck Halfdan
Rye Morten Beck
Sandve Geir Kjetil
Trengereid Kai
Publication venue: 'Oxford University Press (OUP)'
Publication date: 07/04/2016
Field of study

The immense increase in availability of genomic scale datasets, such as those provided by the ENCODE and Roadmap Epigenomics projects, presents unprecedented opportunities for individual researchers to pose novel falsifiable biological questions. With this opportunity, however, researchers are faced with the challenge of how to best analyze and interpret their genome-scale datasets. A powerful way of representing genome-scale data is as feature-specific coordinates relative to reference genome assemblies, i.e. as genomic tracks. The Genomic HyperBrowser (http://hyperbrowser.uio.no) is an open-ended web server for the analysis of genomic track data. Through the provision of several highly customizable components for processing and statistical analysis of genomic tracks, the HyperBrowser opens for a range of genomic investigations, related to, e.g., gene regulation, disease association or epigenetic modifications of the genome.publishedVersio

University of Bergen

The Genomic HyperBrowser: an analysis web server for genome-scale data

Author: Clancy Trevor
Drabløs Finn
Ferkingstad Egil
Frigessi Arnoldo
Glad Ingrid Kristine
Gunathasan Krishanthi
Gundersen Sveinung
Holden Lars
Holden Marit
Hovig Johannes Eivind
Johansen Morten
Kalaš Matúš
Lien Tonje Gulbrandsen
Liestøl Knut
Nygaard Vegard
Nygård Ståle
Paulsen Jonas
Rydbeck Halfdan
Rye Morten Beck
Sandve Geir Kjetil
Trengereid Kai
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2013
Field of study

PubMed Central

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

EDAM-bioimaging : The ontology of bioimage informatics operations, topics, data, and formats

Author: Bogovic John
Chessel Anatole
Colombelli Julien
Dufour Alexandre
Gaignard Alban
Golani Ofra
Hörl David
Ison Jon
Jones Martin
Kalaš Matúš
Kirschmann Moritz,
Levet Florian
Lindblad Joakim
Miura Kota
Moore Josh
Munck Sebastien
Paavolainen Lassi
Paul-Gilloteaux Perrine
Plantard Laure
Rössler Fabienne
Sampaio Paula
Scholz Leandro,
Sladoje Nataša
Waithe Dominic
Zhang Chong
Publication venue: HAL CCSD
Publication date: 02/02/2019
Field of study

International audienceThe ontology of bioimage informatics operations, topics, data, and formats What? EDAM-bioimaging is an extension of the EDAM ontology, dedicated to bioimage analysis, bioimage informatics, and bioimaging. Why? EDAM-bioimaging enables interoperable descriptions of software, publications, data, and workflows, fostering reliable and transparent science. How? EDAM-bioimaging is developed in a community spirit, in a welcoming collaboration between numerous bioimaging experts and ontology developers. How can I contribute? We need your expertise! You can help by reviewing parts of EDAM-bioimaging, posting comments with suggestions, requirements, or needs for clarification, or participating in a Taggathon or another hackathon. Please see https://github.com/edamontology/edam-bioimaging#contributing. EDAM-bioimaging is developed in an interdisciplinary open collaboration supported by the hosting institutions, participating individuals, and NEUBIAS COST Action (CA15124) and ELIXIR-EXCELERATE (676559) funded by the Horizon 2020 Framework Programme of the European Union. https://github.com/edamontology/edam-bioimaging @edamontology /edamontology/edam-bioimagin

Community-driven development for computational biology at Sprints, Hackathons and Codefests

Author: Afgan Enis
Banck Michael
Bonnal Raoul JP
Booth Timothy
Chapman Brad A
Chilton John
Cock Peter JA
Guimera Roman Valls
Gumbel Markus
Harris Nomi
Holland Richard
Kaján László
Kalaš Matúš
Katayama Toshiaki
Kibukawa Eri
Möller Steffen
Powel David R
Prins Pjotr
Quinn Jacqueline
Sallou Olivier
Seemann Torsten
Sloggett Clare
Soiland-Reyes Stian
Spooner William
Steinbiss Sascha
Strozzi Francesco
Tille Andreas
Travis Anthony J
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: Computational biology comprises a wide range of technologies and approaches. Multiple technologies can be combined to create more powerful workflows if the individuals contributing the data or providing tools for its interpretation can find mutual understanding and consensus. Much conversation and joint investigation are required in order to identify and implement the best approaches. Traditionally, scientific conferences feature talks presenting novel technologies or insights, followed up by informal discussions during coffee breaks. In multi-institution collaborations, in order to reach agreement on implementation details or to transfer deeper insights in a technology and practical skills, a representative of one group typically visits the other. However, this does not scale well when the number of technologies or research groups is large. Conferences have responded to this issue by introducing Birds-of-a-Feather (BoF) sessions, which offer an opportunity for individuals with common interests to intensify their interaction. However, parallel BoF sessions often make it hard for participants to join multiple BoFs and find common ground between the different technologies, and BoFs are generally too short to allow time for participants to program together. Results: This report summarises our experience with computational biology Codefests, Hackathons and Sprints, which are interactive developer meetings. They are structured to reduce the limitations of traditional scientific meetings described above by strengthening the interaction among peers and letting the participants determine the schedule and topics. These meetings are commonly run as loosely scheduled "unconferences" (self-organized identification of participants and topics for meetings) over at least two days, with early introductory talks to welcome and organize contributors, followed by intensive collaborative coding sessions. We summarise some prominent achievements of those meetings and describe differences in how these are organised, how their audience is addressed, and their outreach to their respective communities. Conclusions: Hackathons, Codefests and Sprints share a stimulating atmosphere that encourages participants to jointly brainstorm and tackle problems of shared interest in a self-driven proactive environment, as well as providing an opportunity for new participants to get involved in collaborative projects

Aberdeen University Research

University of Bergen

Harvard University - DASH

Springer - Publisher Connector

PubMed Central

The University of Manchester - Institutional Repository

NORA - Norwegian Open Research Archives

University of Melbourne Institutional Repository

NERC Open Research Archive